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Abstract — There are many oil companies operating in 
Macae-RJ, Campos basin, they value the safety of work 
and the lives of its employees. These companies do a 
study to verify the health status of their employees' spine 
and result in a database with six attributes, such as: 
Pelvic incidence, Pelvic inclination, Lumbar lordosis 
angle, Sacral inclination. Pelvic radius and Degree of 
Spondylolisthesis. For in Brazil, recently, Social Security 
has released statistical studies that show back pain as 
leaders in the ranking of departures in the first half of 
2016, a fact that directly affects the productivity of 
companies and health of their employees. This article 
aims to apply the KDD process, specifically the task of 
Data Mining classification, ie, classify if the employee 
will be fit or unfit for the job. The decision tree was the 
technique chosen through the algorithm J48 to verify the 
possibilities of treatment of the collaborators in the 
prevention and improvement in the working environment, 
and even, a change in the management was made from 
the results found. It resulted in inadequate staff postures, 
inadequate sendee stations, lack of training in equipment 
handling, lack of knowledge about cargo handling. 
Keywords — Data Mining; KDD; Column; Occupational 
Accidents. 

L INTRODUCTION 

In 1943, the creation of the Consolidation of Labor 
Laws (CLT) was approved, then sanctioned by the then 
president of the republic, Getulio Vargas. One of the 
chapters dealt with occupational safety and medicine, 
establishing coordination, orientation, control and 
supervision of activities related to occupational health and 
safety throughout the national territory, including the 
National Campaign for the Prevention of Accidents at 
Work. In addition, it established as an assignment of the 
companies "to instmet the employees, through service 


orders, as to the precautions to be taken to avoid accidents 
at work or occupational diseases". With CLT, Vargas 
would go down in history as the benefactor of the 
working class (FA LE1R QS, 2002). 

The area of work safety directly affects all 
productivity of a company, and when accidents with time 
off from work, the sector from which it was deprived of 
the employee is below its normal production capacity. In 
Brazil, the National Health Survey indicates that more 
than 20 million people suffer from some chronic disease 
in the spine (FALEIROS, 2002). 

The pains can worsen from several strands such as 
stress, overweight or smoking. In more severe cases, it is 
possible that repetition of movements, overload and poor 
posture can lead to scoliosis (curving of the spine) or even 
disc hernias. Depending on the position and function of 
the worker, it is still possible that other diseases, such as 
RSI / Dort, Repetitive Strain Injuries / Work-related 
Musculoskeletal Disorders, are developed. 

The International Classification of Impairments and 
Disabilities of the World Health Organization recognizes 
low back pain as a compromise that reveals loss or 
abnormality of the lumbar spine structure of 
psychological, physiological, or anatomical etiology, or a 
disability that prevents the full performance of work 
activities (WORLD HEALTH ORGANIZATION, 1980). 

Schilling, in 1984, proposed a classification of work- 
related diseases divided into three groups: 

I. diseases that have work as the necessary cause, such 
as occupational accidents and occupational diseases 
legally recognized; 

n. diseases that have work as one of the contributing 
factors; 

HI. diseases that have work as aggravating or provoking 
latent or pre-existing disorders. 
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Using the Schilling classification, occupational low 
back pain can be classified as Schilling II when the work 
is considered one of the contributing factors for its onset, 
or Schilling III when the work is considered as 
aggravating factor of a preexisting disorder or pathology. 

An alternative to avoid this kind of problem in 
companies is to establish a Health and Safety 
Management (TSS) that considers severalaspects,suchas 
the direct and indirect risks to which the worker is 
exposed. In parallel to this, encouraging other actions, 
such as Internal Week of Prevention of Work Accident 
(SIPAT) or Labor Gymnastics, can prevent the incidence 
of these diseases. In addition to the decrease in the rate of 
remoteness, OSH-related activities broaden the 
perceptions of the worker, who tends to become more 
aware and apply the knowledge also in personal life. 

The study in question is the use of KDD (Knowledge 
Discovery in Databases) in the database of a multinational 
oil company that operates in Macae-RJ, in the Campos 
basin, where six attributes were used: Pelvic incidence. 
Pelvic inclination. Lumbar lordosis angle, Sacral 
inclination. Pelvic radius and Degree of 
Spondylolis thes is. 

The KDD process involves several steps ranging from 
understanding the problem to be solved to extracting 
knowledge through data mining techniques. KDD is a 
process proposed in 1989 that according to Fayyad et al. 
(1996, quoted by Iiebstein, 2005) is not trivial, 
identifying patterns that are valid, new, potentially useful 
and understandable. This involves finding and 
interpreting patterns in the data, iteratively and 
interactively, by repeating the algorithms and analyzing 
their results. 

In order to work with the prevention of spinal diseases 
and time involved with withdrawal that generate a very 
high cost for the company, the purpose of this article is to 
use the task of classification of data mining to define 
whether the employee is fit or not to the work. 

What all companies have in common, regardless of 
the method used, is the need to analyse the economic 
efficiency of a particular project, by grouping all costs, 
the value obtained must be less than the estimated value 
that it can generate as revenues or benefits. 

II. METHODOLOGY 

The study in question is a database of a multinational 
oil company operating in Macae-RJ, in the Campos basin, 
related to the column, having six main parameters, such 
as: Pelvic incidence. Pelvic inclination, angle of lordosis 
lumbar, sacral inclination, pelvic radius and degree of 
spondylolis thesis. 

The open source data mining tool used was WEKA 
version 3.8 and the implemented algorithm was J48. The 
tool was developed by the University of Waikato in New 


Zealand. It can be defined as a collection of machine 
learning algorithms to perform data mining tasks. 

WEKA has been increasingly applied and some 
interesting features help to explain its success 
(MURASSE and TSUNODA apud MARKOV and 
RUSSELL, 2006): 

- Contains several algorithms for data mining, web 
mining and machine learning; 

- Has open source and is available on the Web for free; 

- It is relatively easy to use, even by people who are not 
experts; 

- Provides flexible resources for experiments; 

- It is kept updated, since new algorithms are added as 
soon as they appear in the literature. 

The data mining process comprises the following 
(MURASSE and TSUNODA apud MARKOV and 
RUSSELL, 2006): 

- Raise data sources (databases, reports, etc); 

- Perform a data cleaning to "load" to WEKA; 

- "Upload" to WEKA the post-cleaning data file; 

- Search patterns relevant to the problem in question 
using the algorithms embedded in the software. 

The Ministry of Labor and Employment and Social 
Security (MTPS) contains a website, a friendly and 
interactive environment for the user, which contains a set 
of data on the main causes of withdrawal from work 
throughout the country, hi this, was downloaded the 
information referring to the main causes having column 
(low back pain) the record holder in the first place. 

The discovery of the knowledge on the database was 
given by the results of the X-ray and Magnetic Resonance 
examinations together with the inplantation of a database 
with 6 attributes in order to better monitor one of the 
main causes of low productivity in the company the 
withdrawal due to in the period from June 2, 2014 to 
October 31, 2016, specifically aimed at Colbal (low back 
pain), where it is worth mentioning that it is a record 
holder of accidents due to remoteness, according to data 
from the Ministry of Labor and Employment and Social 
Security, of a total of 310 employees investigated, 
totaling the members of the company's staff and released 
in the dataset, requested by the Work Doctor together 
with a colleague Orthopedist and the Labor Safety 
Engineer, who is responsible , together with the 
occupational physician, for the implementation of 
mitigating measures that may reduce the scenario of 
remoteness, where the occupational physician who works 
as coordinator of the Medical Control and Occupational 
Health Program (NR07) of the company studied, is a 
member of the company's Specialized Medical and 
Occupational Safety Engineering (NR04) sector. 

In this algorithm, the decision tree is modeled based 
on the most significant attribute, which appears as the 
root of the tree. From this root, branches are generated, 
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which represent the relevance of this connection. These 
branches can also generate other branches that would 
work the same way. Such a stmcture would then have the 
capacity to represent, intuitively, where knowledge could 
be extracted. 

Goldshmidt (2005) says that decision trees are also 
known by the names of regression trees, or even 
classification trees and that they are graphical 
representations of a set of mles, consisting of roots, 
branches and knots, similar to a tree , where the analysis 
of these representations must be performed from the top 
to the leaves. These decision trees have as nodes the leaf 
values of the attributes of the base and the leaf nodes as 
the instances of these, that is, each of the decisions taken 


to carry out this classification are pertinent to a single 
node. 

The J48 algorithm generates decision tree models 
from the top to the bottom, so that on each node other 
attributes are evaluated individually to determine their 
significance in the connection or even existence in it. 

III. RESULTS AND DISCUSSION 

It is noticed that with the degree of spondylolisthesis 
less than or equal to 19.85° and Pelvic Radius is greater 
than 125,21°, the collaborator is considered fit. 

If the degree of spondylolisthesis is greater than 19, 
the employee is considered as unfit. 



Fig.2: Decision tree. Source: WEKA - Algorithm J48. 


When the degree of spondylolisthesis is less than or 
equal to 19.85 °, the pelvic radius less than or equal to 
125.21 °, the sacral inclination is less than or equal to 
40.47 °, the pelvic inclination is greater than 9.97 °, the 
employee is considered unfit. 

When the degree of spondylolisthesis is less than or 
equal to 19.85 °, the pelvic radius less than or equal to 
125.21, the sacral inclination is greater than or equal to 
40.47 °, and the degree of spondylolisthesis is greater 
than 9.06 °, considered unfit. If, on the other hand, the 
degree of spondylolisthesis is less than or equal to 9.06°, 
www.iiaers.com 


the pelvic inclination is less than or equal to 18.89 ° is 
considered fit. If the pelvic tilt is greater than 18.89 
degrees and the lumbar lordosis angle is greater than 56.3 
degrees, it is considered as Apt. If the angle of lumbar 
lordosis is less than or equal to 56.3 ° and the pelvic 
inclination is greater than 65.01 °, it is considered as unfit. 
If the pelvic tilt is less than or equal to 65.01, it is 
considered fit. 

When the degree of spondylolisthesis is less than or 
equal to 19.85 °, the pelvic radius less than or equal to 
125.21 °, the sacral inclination is less than or equal to 
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40.47 ° and the pelvic inclination is greater than 9.97 °, 
the employee is considered unfit. 

As a result of the analysis of the decision tree formed 
by the algorithm J48, it is possible to emphasize that the 
collaborators with diagnosis of spondylolisthesis are 
considered incapable for work activity, within the 
investigative clinical process, we tried to relate lumbar 
pain complaints with the cause. Considering this, and 
considering the attributes analyzed and the so-called 
normal parameters, it is perceived that the low grade 
spondylolisthesis associated with a sacral tilt out of 
normality also makes the collaborator incapable. 

When the clinical investigation is the initial diagnosis 
is inconclusive for spondylolisthesis, analysis begins from 
the pelvic radius when out of normality, it is advanced to 
the analysis of the sacral inclination that when associated 
with spondylolisthesis or pelvic tilt is also considered 
inapt. 

Collaborators who during the clinical investigation 
showed a sacral inclination associated with pelvic tilt 
were also unable to perform their activities, as well as 
those who presented lordosis associated with pelvic 
incidence out of normality. 

This analysis becomes interesting for a company, 
since from it can develop a profile for physical evaluation 
compatible with the requirements necessary for the 
desired position, avoiding sick leave of the employee, 
also avoiding loss of productive capacity and 
consequently loss to the company. 

IV. FINAL CONSIDERATIONS 

This study presents an analysis of the factors that lead 
to work withdrawal, decreasing the productivity of 
companies and increasing the overload on social security 
agencies such as INSS. To identify these factors, a 
database with several attributes was used, which could 
influence the removal of the work, through spinal pain. 
This was possible using the KDD process, and with it was 
extracted mles that show possible causes that present the 
highest probability of the work spreads by the column 
through six attributes. This result allows the company to 
take mitigating actions to reduce work-related distress 
due to low back pain (column) and preventive actions 
such as an Ergonomic Work Analysis (AET), specific 
training recommended in MTPS NR-17. 

The results obtained were also determinant for the 
Specialized Service in Medicine and Engineering of 
Work Safety of the company of the petroleum industry 
studied, in order to comply with the ergonomic 
procedures of all the platforms in which it provides 
service, being also possible to assist in the identification 
of which functions and areas require greater care, such as, 
for example, workers who work in the area of cargo 
handling. 
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Soon, the oil company may include in its work safety 
procedures and admission, arigorin the need of hiring for 
jobs that allow the change of positions during the work 
day, with the addition of pauses that allow the body not to 
wear out too much. 

In order to do this, it is necessary to design jobs that 
are adaptable to the anthropometric variations of the 
workers, in addition to avoiding the transportation of very 
heavy loads and for long journeys, considering whenever 
necessary, the rigor in the admission examinations 
compatible with the position to be occupied, in order to 
avoid future damage to the physical and mental health of 
the employee and the financial company. 

From this, this article demonstrates the effectiveness 
of the J48 classifier in assembling a decision tree with the 
six attributes, with 81.61% of the instances correctly 
classified, being a satisfactory result for decision making. 
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